-
Notifications
You must be signed in to change notification settings - Fork 675
feat: add parallelization filters #4144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: hongkuanz <[email protected]>
…/parallel-filter
Signed-off-by: hongkuanz <[email protected]>
WalkthroughThis pull request refactors the profiler system to centralize model metadata into a ModelInfo dataclass, introduces parallel-mapping support for per-GPU profiling sweeps, moves ConfigModifierProtocol to a dedicated protocol module, and updates the search-space auto-generation logic with clearer GPU-discovery control flow and conditional compilation. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Poem
Pre-merge checks❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (7)
tests/profiler/test_profile_sla_aiconfigurator.py (2)
21-22: Drop redundant noqa. Ruff now flags# noqa: E402as unused (RUF100). Removing the directive keeps lint clean without changing behavior. Based on static analysis hints.-from benchmarks.profiler.utils.model_info import ModelInfo # noqa: E402 +from benchmarks.profiler.utils.model_info import ModelInfo
26-33: Remove unused fixture argument.requestis no longer used here and triggers Ruff ARG001; dropping it keeps the autouse override while satisfying lint. Based on static analysis hints.-@pytest.fixture(autouse=True) -def logger(request): +@pytest.fixture(autouse=True) +def logger():benchmarks/profiler/utils/plot.py (1)
48-54: Docstring parameter rename. The docstring still refers tomapping_labels, but the function now acceptsparallel_mapping_labels. Updating the wording keeps the docs aligned with the signature.- mapping_labels: optional list of strings describing parallelization mapping per point + parallel_mapping_labels: optional list of strings describing parallelization mapping per pointtests/profiler/test_profile_sla_dryrun.py (2)
22-24: Drop redundant noqa. Same as in the aiconfigurator tests: Ruff reports RUF100 for this# noqa: E402, so we can remove it without changing behavior. Based on static analysis hints.-from benchmarks.profiler.utils.model_info import ModelInfo # noqa: E402 +from benchmarks.profiler.utils.model_info import ModelInfo
30-37: Remove unused fixture argument. This autouse override doesn’t userequest; trimming it avoids Ruff ARG001 while preserving the intended behavior. Based on static analysis hints.-@pytest.fixture(autouse=True) -def logger(request): +@pytest.fixture(autouse=True) +def logger():benchmarks/profiler/profile_sla.py (1)
118-132: Simplify sweep_max_context_length logic.The
hasattr()check on line 120 is unnecessary sincemodel_infois always aModelInfoobject with amax_context_lengthattribute (possibly None).Apply this diff to simplify:
- # Determine sweep max context length: allow user-provided cap to override model's if smaller - sweep_max_context_length = getattr(args, "max_context_length", None) - if hasattr(args, "model_info") and args.model_info is not None: - model_max_ctx = args.model_info.max_context_length - if sweep_max_context_length is None: - sweep_max_context_length = model_max_ctx - elif model_max_ctx is not None and model_max_ctx < sweep_max_context_length: + # Determine sweep max context length: use user-provided cap, constrained by model's maximum + sweep_max_context_length = getattr(args, "max_context_length", None) + model_max_ctx = args.model_info.max_context_length + if sweep_max_context_length is None: + sweep_max_context_length = model_max_ctx + elif model_max_ctx is not None and model_max_ctx < sweep_max_context_length: logger.info( f"User-provided max_context_length={sweep_max_context_length} exceeds model's maximum {model_max_ctx}; using model maximum." ) sweep_max_context_length = model_max_ctx - if sweep_max_context_length is None: - logger.warning( - "No max_context_length available from args or model; proceeding without a cap." - ) + if sweep_max_context_length is None: + logger.warning( + "No max_context_length available from args or model; proceeding without a cap." + )benchmarks/profiler/utils/search_space_autogen.py (1)
85-89: Clarify error message for missing model.The error message "No model provided, cannot auto-generate GPU search space" is misleading because line 64 already attempts to extract the model name from the config if not explicitly provided. A clearer message would indicate failure to extract or determine the model.
Consider this diff:
if not args.model: - # TODO: get model info provided DGD config - error_msg = "No model provided, cannot auto-generate GPU search space. Please provide `--model` or GPU info" + error_msg = "Failed to determine model name from config. Cannot auto-generate GPU search space. Please provide `--model` explicitly or specify GPU parameters." logger.error(error_msg) raise RuntimeError(error_msg)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
benchmarks/profiler/profile_sla.py(16 hunks)benchmarks/profiler/utils/config.py(1 hunks)benchmarks/profiler/utils/config_modifiers/__init__.py(1 hunks)benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py(1 hunks)benchmarks/profiler/utils/config_modifiers/protocol.py(1 hunks)benchmarks/profiler/utils/config_modifiers/sglang.py(1 hunks)benchmarks/profiler/utils/model_info.py(5 hunks)benchmarks/profiler/utils/plot.py(2 hunks)benchmarks/profiler/utils/profiler_argparse.py(2 hunks)benchmarks/profiler/utils/search_space_autogen.py(3 hunks)tests/profiler/test_profile_sla_aiconfigurator.py(2 hunks)tests/profiler/test_profile_sla_dryrun.py(12 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
benchmarks/profiler/utils/config_modifiers/__init__.py (1)
benchmarks/profiler/utils/config_modifiers/protocol.py (1)
ConfigModifierProtocol(21-84)
benchmarks/profiler/utils/profiler_argparse.py (1)
benchmarks/profiler/utils/search_space_autogen.py (1)
auto_generate_search_space(32-140)
benchmarks/profiler/utils/plot.py (1)
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py (1)
label(31-38)
tests/profiler/test_profile_sla_aiconfigurator.py (1)
benchmarks/profiler/utils/model_info.py (1)
ModelInfo(107-114)
tests/profiler/test_profile_sla_dryrun.py (1)
benchmarks/profiler/utils/model_info.py (1)
ModelInfo(107-114)
benchmarks/profiler/utils/config_modifiers/protocol.py (3)
components/src/dynamo/planner/defaults.py (1)
SubComponentType(140-142)benchmarks/profiler/utils/config_modifiers/sglang.py (10)
convert_config(82-186)set_config_tp_size(189-210)set_config_tep_size(213-245)set_config_dep_size(248-280)get_model_name(283-308)get_port(311-339)get_kv_cache_size_from_dynamo_log(342-355)load_default_config(44-46)update_model(49-74)update_image(77-79)benchmarks/profiler/utils/config.py (1)
update_image(360-380)
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py (3)
benchmarks/profiler/utils/model_info.py (1)
ModelInfo(107-114)benchmarks/profiler/utils/config_modifiers/protocol.py (3)
set_config_tp_size(32-38)set_config_tep_size(41-48)set_config_dep_size(51-58)benchmarks/profiler/utils/config_modifiers/sglang.py (3)
set_config_tp_size(189-210)set_config_tep_size(213-245)set_config_dep_size(248-280)
benchmarks/profiler/profile_sla.py (6)
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py (4)
ParallelizationMapping(22-38)apply_parallel_mapping_to_config(156-172)get_candidate_parallel_mappings(41-153)label(31-38)benchmarks/profiler/utils/profile_cache.py (4)
check_prefill_results_exist(26-53)load_existing_prefill_results(91-108)check_decode_results_exist(56-88)load_existing_decode_results(111-138)benchmarks/profiler/utils/estimate_perf.py (4)
estimate_prefill_perf(132-153)get_max_batch_size(155-209)estimate_perf(74-130)get_max_kv_tokens(211-231)deploy/utils/dynamo_deployment.py (5)
DynamoDeploymentClient(98-495)create_deployment(220-285)get_deployment_logs(444-475)get_service_url(212-218)delete_deployment(477-495)benchmarks/profiler/utils/plot.py (1)
plot_prefill_performance(36-84)benchmarks/profiler/utils/profile_decode.py (2)
get_num_request_range(25-37)profile_decode_aiconfigurator(143-171)
benchmarks/profiler/utils/search_space_autogen.py (5)
benchmarks/profiler/utils/model_info.py (2)
ModelInfo(107-114)get_model_info(117-229)benchmarks/profiler/utils/config_modifiers/sglang.py (1)
get_model_name(283-308)benchmarks/profiler/utils/config_modifiers/trtllm.py (1)
get_model_name(282-301)benchmarks/profiler/utils/config_modifiers/vllm.py (1)
get_model_name(231-250)deploy/utils/gpu_inventory.py (1)
get_gpu_summary(387-417)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4144/merge) by tedzhouhk.
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
[error] 1-1: pre-commit isort hook modified the file.
[error] 1-1: pre-commit black hook reformatted the file.
[error] 134-134: Ruff: Ambiguous variable name: I (E741). Consider using a more descriptive name.
🪛 Ruff (0.14.3)
tests/profiler/test_profile_sla_aiconfigurator.py
21-21: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
26-26: Unused function argument: request
(ARG001)
tests/profiler/test_profile_sla_dryrun.py
22-22: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
136-136: Ambiguous variable name: I
(E741)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (14)
benchmarks/profiler/utils/model_info.py (4)
107-114: LGTM: Clean data container with appropriate types.The ModelInfo Pydantic model provides a well-structured container for model metadata with clear field types and optional defaults.
128-131: Good refactoring: eliminates config mutation.Using a local variable instead of mutating the config object is cleaner and avoids side effects.
215-219: Verify that taking max() of block size list is semantically correct.The code handles the case where
quantization_block_sizeis a list by taking the maximum value. Ensure this is the correct interpretation for all quantization schemes that return lists (e.g., separate input/output block sizes).Consider adding a comment explaining why max() is chosen:
- # Handle case where block size is a list (e.g., [128, 128] for [input, output] block sizes) + # Handle case where block size is a list (e.g., [128, 128] for [input, output] block sizes). + # We take the maximum to ensure the most conservative block size constraint for validation. if ( isinstance(quantization_block_size, list) and len(quantization_block_size) > 0 ): quantization_block_size = max(quantization_block_size)
221-229: LGTM: Clean ModelInfo construction.The return statement clearly maps all detected model attributes to the ModelInfo fields.
benchmarks/profiler/profile_sla.py (4)
437-442: LGTM: Correct attention_dp_size calculation for DEP.The logic correctly sets
attention_dp_sizetonum_gpuswhen DEP is used (data parallelism across experts), and defaults to 1 for TP-based configurations.
537-593: LGTM: Sound mapping selection strategy.The best mapping selection logic correctly prioritizes:
- Meeting latency targets (TTFT/ITL)
- Maximizing throughput per GPU among valid configurations
The dry-run fallback with TP mapping is appropriate.
595-676: LGTM: Correct interpolation with best prefill mapping.The prefill interpolation phase correctly:
- Applies the best selected mapping to the config
- Uses
sweep_max_context_lengthfor bounds- Provides appropriate
tp_sizefallback for AI configurator (line 629)
678-770: LGTM: Correct interpolation with best decode mapping.The decode interpolation phase correctly:
- Applies the best selected mapping to the config
- Computes
attention_dp_sizeappropriately for DEP configurations- Provides appropriate
tp_sizefallback for AI configuratorbenchmarks/profiler/utils/config_modifiers/parallelization_mapping.py (3)
21-38: LGTM: Clean parallelization mapping dataclass.The frozen dataclass design is appropriate for this value object. The
label()method provides clear human-readable descriptions for logging and plotting.
41-66: LGTM: Clear MoE-aware candidate generation.The function correctly generates different parallelization strategies based on model type (MoE vs. dense) and phase (prefill vs. decode).
156-172: LGTM: Correct mapping application logic.The function correctly delegates to the appropriate config modifier method based on the mapping type (TP/TEP/DEP) and phase.
benchmarks/profiler/utils/search_space_autogen.py (3)
37-58: LGTM: Cleaner config loading flow.The refactored config loading logic is more straightforward - always load the config first, then optionally update it.
60-76: LGTM: Centralized model info retrieval.The refactoring centralizes model info retrieval and stores it in
args.model_infofor use throughout the profiling flow. The informative logging is helpful for debugging.
123-139: LGTM: Sensible default GPU configuration.The fallback logic provides reasonable default values (min=1, max=4, gpus_per_node=8) when GPU discovery is disabled, with clear logging.
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
Outdated
Show resolved
Hide resolved
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
Outdated
Show resolved
Hide resolved
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
Outdated
Show resolved
Hide resolved
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
Outdated
Show resolved
Hide resolved
…/parallel-filter
Signed-off-by: hongkuanz <[email protected]>
benchmarks/profiler/utils/config_modifiers/parallelization_mapping.py
Outdated
Show resolved
Hide resolved
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
…dynamo into hzhou/parallel-filter
keivenchang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for commenting out the test
Summary by CodeRabbit
Release Notes
New Features
Improvements
Refactor